home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
ftp.cs.arizona.edu
/
ftp.cs.arizona.edu.tar
/
ftp.cs.arizona.edu
/
icon
/
newsgrp
/
group99a.txt
/
000172_icon-group-sender _Fri Jul 30 17:49:41 1999.msg
< prev
next >
Wrap
Internet Message Format
|
2000-09-20
|
10KB
Return-Path: <icon-group-sender>
Received: (from root@localhost)
by baskerville.CS.Arizona.EDU (8.9.1a/8.9.1) id RAA24130
for icon-group-addresses; Fri, 30 Jul 1999 17:44:44 -0700 (MST)
Message-Id: <199907310044.RAA24130@baskerville.CS.Arizona.EDU>
From: gep2@terabites.com
Date: Fri, 30 Jul 1999 18:16:13 -0500
Subject: When Do You Keep A Quirk?
To: icon-group@optima.CS.Arizona.EDU
Errors-To: icon-group-errors@optima.CS.Arizona.EDU
Status: RO
> Gordon Peterson makes an excellent argument for moving to a less quirky
definition of basename,
...thank you...
> ...going so far as to purposely break older programs that might depend on the
quirks.
Right.
Some years ago, I was the programmer in charge of Datapoint's disk operating
system. Although the system was really quite reliable, we'd (only rarely)
occasionally seen problems of the Cluster Allocation Table (the bitmap of
available clusters on the disk) becoming corrupted. This would eventually
result in clusters being cross-linked to more than one file, obviously resulting
in serious problems. Other times, we'd sometimes simply "lose" space on the
disk that was no longer allocated to any file.
One day I was wondering about how it was that file close (and space
deallocation, including when a file was being totally deleted) function was able
to run so quickly, even for large files.
In our system, we had a fixed number of disk controller buffers, and a fixed
number of Logical File Table entries (think of these as being similar the the
data referenced by a "file handle"). When deleting a file (or the end, unused
part of a file that had been extended) the system loaded the Cluster Allocation
Table into disk controller buffer zero, and the RIB (Retrieval Information
Block) for the file (detailing the locations and sizes of the allocated pieces
for the file) into the disk controller buffer corresponding to the LFT entry in
use when the file was being closed. It then ran down through the appropriate
entries in the RIB, clearing bits in the CAT and rewriting the CAT and its
backup copy to disk when done.
As I thought about this, I realized that if one were using LFT entry zero to
close a file and deallocate space, the two blocks would be loaded into the same
buffer, resulting in (sometimes) extraneous entries in the CAT being cleared
(and for a variety of technical reasons, probably not worth going into here,
this tended to occur most often when a file that had been on the disk for a long
time had been moved or replaced) since now the CAT was being also used as if it
were the RIB of the file being closed as well.
I couldn't imagine a program actually doing such a thing as closing a file using
LFT entry zero, but just as a precaution built into the system a feature that
would prevent the incorrect behavior by not allowing files to be closed and
their space deallocated when using that LFT entry.
It wasn't too long after that when one afternoon the programmer in charge of our
BASIC compiler came in, angry as all hell, pissed off at the change I've made
and complaining that BASIC needed to use LFT entry zero when deleting files...
and BINGO, we'd found the cause of the mysterious CAT corruption. :-)
Anyhow, I tend to think that (if this BASENAME routine is actually used at all)
it's PROBABLY being used by people who are unaware of the bug, and using it
presuming that it works the way it sounds like it's supposed to work. The
"quirky" behavior (i.e. LONGSTANDING BUG) probably results in a lot of "quirky
behavior" (i.e. LONGSTANDING BUGS) in the programs which use BASENAME, too.
When changing such an implementation detail, I'd think that a reasonable way to
handle that is to carefully document (preferably in a source file comment,
perhaps also in a more global README file or something) the "bug" that was
fixed, examples of previous and new behavior, and let it go at that. The other
idea (about adding a mandatory third parameter) would be one way to make SURE
that programs using the routine had been made aware of the fix, and that someone
responsible for each such program had (hopefully) considered the behavior and
what mode they genuinely wanted. (Leaving each of the perhaps-desired modes in
the routine, in this case, is no big deal and results in a routine only slightly
larger than it would be otherwise... therefore I don't see this as being a huge
issue).
> I am totally sympathetic to this point of view. I had a friend that worked on
the ANSI C standards committee. He frequently lamented that there were proposed
changes to C that every committee member agreed would make a for a better
language, but were voted down because they would break large, existing
applications. One has to weigh the cost of revising old programs with the
possible gains in new programs.
That's perhaps a very different issue, since there it may be much more difficult
to provide BOTH functionalities as options. And here we're not talking about a
core language issue anyhow... BASENAME is only just an IPL routine (not even
part of the standard compiler runtime or anything) and the IPL is primarily used
as EXAMPLES of how stuff is done in Icon. Although certainly some of the other
IPL routines use it (and in fact, that might be an interesting exercise... to
see if any of those programs count on the quirky behavior or leave it as just
another latent bug) it's not clear that the routine is hugely widely used
otherwise. (And if, indeed, the correction would FIX a latent bug in each of
the IPL sources which call BASENAME, then that would reinforce the decision to
fix it, and reduce the worry about programs counting on the anachronistic,
buggy, Unix-y behavior).
> In the case of basename, there are over a dozen IPL sources that use
basename. If these IPL programs have not been tested with the new version of
basename, I would be very concerned about this change, for this can effect
programs that are not even using basename directly.
Are those IPL sources complete stand-alone programs, or lower-level functions
likely to have been designed into larger programs or systems?
> Of course, it is very hard to determine how much non-IPL code out there uses
basename, or how many programs depend on the quirks.
Yes. But there are probably at least as many programs which have latent bugs as
a result of the quirks... and I'd tend to believe that a program would count on
the routine behaving the way one would EXPECT it to behave (i.e. without bugs)
based on the description of what the routine is supposed to do.
Again, I'll point out that I'm not aware that the routine has ever been
documented as behaving with all the same bugs and flaws as the anachronistic
Unix routine historically called "basename" (and that's a VERY generic name, not
something like "fsprintx" or the like), and that in this business we don't have
eternal copyrights or trademarks on function names and ancient (buggy!)
implementations based on those names.
> As much as I prefer the new version of basename, making this change in the
short run is too risky.
You're entitled to your opinion... as are all the rest of us too.
> If and when we do wish to change the behavior of basename, there is a way to
ease the migration path. We can write a procedure old_basename that has the
old behavior, then add the following line to old source files:
> $define basename old_basename
Personally, I don't see that that makes a whole lot of sense. The change is
cosmetic at best, and more confusing to programmers who (forever) look at a
source file being edited and see 'basename' without being made aware of the
macro substitution. If you're gonna change the routine, then let's change it
and be done with it. At least adding a third (required) parameter would help
make it more obvious to programmers that this routine is NOT an Icon-implemented
clone of the archaic Unix "basename".
> One final point: could we please cut down on ending Icon group mailings with
political or religious messages? I say this not because I disagree (or
agree) with the religious / political / aesthetic views of the members of
the group. It's just that the purpose of the group is to exchange
information on the Icon programming language. This is not the place for
discussion of these other topics.
I presume the writer is commenting on my sig file, which makes a statement
against SPAM E-mails, as well as reminding people of the **shameful** way in
which our congresscritters, sent to Washington promising to represent their
constituents, instead brazenly participated in the most outrageous orgy of
partisan, politically motivated bullshit (and CLEARLY contrary to the expressed
will of the majority of the voters) that this country has seen in as far back as
*I* (at almost 50) can remember. While I do agree that this is NOT the forum to
dwell on discussions of that occurrence, I'll point out too that sig files are a
VERY longstanding way for people on the Internet to voice their opinions on a
variety of subjects (sort of a "bumpersticker for E-mail") and as such are an
important element of free speech. If you don't like to put your stand on such
issues in your SIG file, then feel free not to. But don't bitch about other
people who do so, just because it makes you uncomfortable to be reminded about
the kind of shameful bullshit that some politicos in this country seem to think
the public will forget about prior to the next elections. Some of us have
longer memories than that!!!
Gordon Peterson
http://web2.airmail.net/gep2/
Support the Anti-SPAM Amendment! Join at http://www.cauce.org/
12/19/98: the day the Conservatives demonstrated their scorn for their
fraudulent sham of representative government. Voters, remember it!